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BaBar has recently deployed a new event data format referred to as the Mini. The Mini uses efficient packing 
and aggressive noise suppression to represent the average reconstructed BaBar event in under 7 KBytes. The 
Mini packs detector information into simple transient data objects, which are then aggregated into roughly 
10 composite persistent objects per event. The Mini currently uses Objectivity persistence, and it is being 
ported to use Root persistence. The Mini contains enough information to support detailed detector studies, 
while remaining small and fast enough to be used directly in physics analysis. Mini output is customizable, 
allowing users to both truncate unnecessary content or add content, depending on their needs. The Mini has 
now replaced three older formats as the primary output of BaBar event reconstruction. A reduced form of the 
Mini will soon replace the physics analysis format as well, giving BaBar a single, flexible event data format 
covering all its needs. 



1. The BaBar Experiment 

BaBar is a multi-purpose detector operating at the 
PEP-II asymmetric B Factory. BaBar has been tak- 
ing data at and near the T(4S I ) resonance since 1999, 
and has accumulated roughly 110 fb _1 of luminos- 
ity to date. BaBar is fairly typical of modern High 
Energy Physics apparati, consisting of several quasi- 
independent detector subsystems arranged roughly 
concentricly about the e + e~ interaction point. The 
innermost subsystem is the Silicon Vertex Tracker 
(Svt), with roughly 150K readout channels. Outside of 
the Svt is the Drift Chamber (Dch) , with roughly 7K 
readout channels. Outside the Dch is the Cherenkov 
Detector (Drc), with roughly 11K readout channels. 
Outside the Drc is an Electromagnetic Calorimeter 
{Emc), with roughly 7K readout channels. Outside 
the Emc is the Instrumented Flux Return {Iff), with 
roughly 60K readout channels. 

BaBar was an early adopter of C++ and 00 pro- 
gramming in HEP, and the vast majority of our soft- 
ware is written in C++ [jj. BaBar has used Objectiv- 
ity as the primary technology for storing event data 
Q , however we are planning to change to a Root based 
event store by the end of 2003 Q . 



2. BaBar Event Data Format History 

BaBar's original software design Q proposed sev- 
eral complimentary event data formats, as described 
in table |21 These formats were intended to satisfy dif- 
ferent use cases, from quality control to reconstruction 
to calibration to physics analysis, with each format op- 
timized for some specific purposes. Each format was 
written to separate Objectivity databases (files), so 
that they could be accessed and managed indepen- 
dently. The Raw format is an Objectivity transcrip- 
tion of the raw data readout by the detector online 
system, and was intended to be used as the input 
to the reconstruction chain. The Rec format repre- 



Format 


Design Size 


Actual Size 


Usage 


Raw 


25 KBytes 


50 KBytes 


Unused 


Rec 


100 KBytes 


120 KBytes 


Unused 


Esd 


10 KBytes 


7 KBytes 


Unused 


Aod 


1 KBytes 


3 KBytes 


Analysis 


Tag 


100 Bytes 


1 KByte 


Selection 


Hdi 





4 KBytes 


Navigation 



Table I BaBar Objectivity event data formats circa 2001. 
Raw refers to raw data, Rec to reconstructed data, Esd to 
event summary data, Aod to analysis data, Tag to event 
selection data, and Hdr to event header data. 



sents the reconstructed physics objects, and was in- 
tended to be used for detector studies, for detailed 
analysis, and for single event display. The Esd for- 
mat is a summary of the reconstruction results, and 
was intended to be the primary format used for high- 
statistics physics analysis. The Aod format was in- 
tended to store highly processed information specific 
to physics analysis. The Tag format was intended to 
store booleans to index and quickly select events. The 
{Hdr) format allows events to 'borrow' some subsys- 
tem data from other events, and was intended to sup- 
port partial re-processing of individual subsystems. 

By 2001, these data formats and their usage in 
BaBar had stabilized. As shown in table [21 many 
of the data formats were not actually used. Addition- 
ally the Aod and Tag formats were considerably larger 
than foreseen, and had taken on different roles than 
originally intended. Subsequent sections explain why 
the formats were not used according to the original 
design, and how that led to the development of the 
Mini. 



2.1. The BaBar Persistence Design 

BaBar's original persistence design can be sum- 
marized as translating transient objects and tran- 
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sient object relationships into equivalent persistent 
objects and persistent references, as illustrated in fig- 
ure n for the specific case of reconstructed tracks. 
The persistent objects were clustered into the var- 
ious databases according to how it was anticipated 
they would be used. This design established the 
now-standard transient — > persistent — » transient 
paradigm in a straightforward way. This design al- 
lowed analysis jobs running on (Esd) data to retrieve 
reconstruction details about objects on demand, by 
following a link back into the Rec database. This 
was considered an important example of how an 00 
database event store might provide significant new 
functionality compared to sequentially organized data 
storage technologies. 

The literal translation of complex transient object 
trees to persistent object trees resulted in a frag- 
mented structure, where different parts of a single 
physics object (a track in figure ^) were distributed 
across several databases. This effectively coupled the 
data formats and database files. For instance, a job 
reading tracks from Esd depended on the Rec database 
to provide the top level tracking persistent object. 
This coupling added enormously to the 10 burden and 
the disk footprint of an analysis job running on Esd. 

Similarly, the Rec format design required that tran- 
sient objects be rebuilt from their constituent Raw 
data. This coupled the Rec format to the Raw, and 
required that a job reading Rec data pull in essentially 
the entire reconstruction code base. A Rec job thus 
consumed a similar amount of resources (cpu, mem- 
ory, and disk) as the original reconstruction job. 

An additional difficulty to accessing the Rec format 
was that the large size of the Rec databases precluded 
storing them on disk. Instead, they were accessible 
only through staging. As the staging space at SLAC 
was originally very limited, dynamic staging through 
the Objectivity HPSS interface was disabled, forcing 
users to stage Raw and Rec databases by hand. This 
tedious and error-prone operation proved impractical 
for the vast majority of BaBar physicists. 

BaBar originally considered having the online sys- 
tem write raw event data directly in Objectivity Raw 
format. However, since 00 database technology was 
new and relatively untested, a more conservative ap- 
proach was taken, where the online system writes a 
flat file version of the raw data, which can then be 
transcribed into Objectivity Raw format. It was then 
found to be more efficient to reconstruct events by di- 
rectly reading the online raw data. The Raw format 
was thus recast as an output instead of an input of 
reconstruction. 

The Raw format was used to pass data between the 
BaBar simulation executable and its reconstruction 
executable. In 2002 BaBar developed a monolithic 
simulation plus reconstruction executable, which elim- 
inated the need for Raw as intermediate storage. This 
monolithic simulation executable has been used for of- 



ficial BaBar Monte Carlo generation since early 2003. 

2.2. The Unused Formats 

The poor performance of jobs reading the Rec for- 
mat, coupled with the difficulty of accessing Rec and 
Raw data, made them nearly impossible to use for 
analysis or detector studies. Only a handful of physi- 
cists on BaBar ever made use of either of these data 
formats, and those uses typically involved very small 
event samples. Instead, most calibration and moni- 
toring tasks run on raw online data, invoking recon- 
struction code as necessary to build objects, or on the 
expanded Aod format. 

The Esd format was not finished in time for BaBar's 
first data, partly because its development was given 
lower priority compared to developing the Raw and 
Rec formats. It was also felt that some experience with 
the BaBar detector and data analysis was necessary 
before the Esd could be correctly designed to meet 
the needs of experiment. It was instead foreseen that 
BaBar's first data would be analyzed using the Rec 
format, from which experience the Esd format could 
be completed. Since BaBar never used Rec to analyze 
first data (see section |2~5|) . the Esd format was never 
completed and never used. 

The capability of events to borrow content from 
other events via the Hdr format was also never used, 
mostly because BaBar never encountered a situation 
which required partial re-processing. The problem of 
managing the dependent event databases this would 
have created was also never addressed. The large size 
of the Hdr format was principly due to large string 
arrays describing component names. The Hdr was re- 
cently redesigned, greatly reducing its size (see [5|). 

2.3. The Evolution of Aod and Tag 

Since it was imperative that BaBar start develop- 
ing its analysis procedures even before first data, and 
since the formats originally intended to be used for 
analysis were effectively unusable, BaBar decided to 
expand the Aod format so that it could be used for 
physics analysis. This was implemented by includ- 
ing a 'four-vector' summary of the reconstruction re- 
sults, together with 'quality values' to describe some 
detector-specific details. 

The Aod format was spectacularly successful in en- 
abling analysis of BaBar's first data, allowing impor- 
tant physics results to be produced in a timely way. 
The Aod has since been BaBar's primary physics anal- 
ysis format, and it underlies all the physics results 
published to date. This format does however have 
several limitations. For one, the only track fit re- 
sult stored in Aod assumes the particle which gen- 
erated the track was a ir ( BaBar reconstruction pro- 
vides all (5) stable particle track fit results). Because 
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Figure 1: Diagram of the object tree representing a reconstructed track in BaBar, on the left in transient form, and on 
the right converted to persistent objects according to BaBar's original persistence model. The persistent objects are 
grouped according to which database they were stored in (the database labels are explained in section 



the energy loss due to material interactions is mass- 
dependent, this limitation introduces a small momen- 
tum bias when the particle is not a n. 

Another limitation of Aod is that it reduces the de- 
tector information to 'a set of numbers' extracted from 
reconstruction objects. This greatly reduces the bene- 
fits which BaBar might have been obtained from from 
using 00 design and interfaces in analysis. It also ef- 
fectively isolated BaBar analysis from reconstruction, 
making it impossible to port code developments from 
one side to the other. 

The Aod format also provides a rigid persistent rep- 
resentation of an event, with no way to tailor its con- 
tents for specific use cases. Consequently, most BaBar 
analyses operate by dumping the Aod format into an 
ntuple, adding data content according to thier needs. 
This effectively doubles BaBar's analysis data storage 
needs. It also decouples analyses from each other, as 
different analysis working groups have developed dif- 
ferent ntuple representations of the Aod format. The 
redundency and inefficiency of this analysis method- 
olgy was a strong motivation for BaBar's new com- 
puting model, described in section [SJ 

The Tag format also evolved when confronted with 
first data. To provide more flexibility when selecting 
events, the Tag format was expanded to include float- 
ing point and integer values as well as booleans. Thus 
the intent of the data formats was 'pushed down' one 
level compared with the original design, with the Aod 
taking the role intended originally for Esd, and Tag 
taking the role intended for Aod. 

By contrast with Raw, Rec, and Esd, the Aod and 
Tag formats were developed to be completely inde- 
pendent of the reconstruction objects. This avoided 
the interdependency problems of the unused formats, 
at the cost of allowing no way to navigate between 
physics analysis objects and the reconstructed and/or 
raw data from which they were derived. 

2.4. The Data Format Gap 

Because of the evolution of BaBar's data formats, a 
large gap had developed, with no practical way to ac- 



cess information between raw online data and physics 
4-vectors. This gap made performing routine func- 
tions like calibrations and detector diagnostics diffi- 
cult and time consuming. The gap also severely lim- 
ited the ability to study detector effects in physics 
analysis. This gap also prevented BaBar from devel- 
oping a usable single event display. The data format 
gap was first officially recognized in 2000 in the report 
of an internal review of BaBar computing 6] . 

The Svt provides one example of how the data for- 
mat gap caused problems. In order to obtain opti- 
mal tracking resolution, the positions of the Svt wafers 
(alignment) must be derived from the data. The Svt 
alignment procedure needs both low-level data (hits) 
and high-level data (tracks) to perform this task. Be- 
cause of the data format gap, it was found that the 
most efficient way of doing this was to read raw data, 
and reconstruct the tracks in the alignment job. The 
alignment job was therefore very slow, and the total 
procedure had a turnaround time of roughly I month. 
This was found to be longer than the time interval over 
which the Svt wafer positions were stable. The long 
turnaround time also stifled development of the align- 
ment procedure, as it took too long to test changes. 
The net result was that the BaBar used a poor align- 
ment of the Svt in reconstruct early data, degrading 
the effective resolution of the detector, and introduc- 
ing sizeable systematic errors in many physics analy- 
sis. Svt misalignment caused the dominant systematic 
error in BaBar's first sin2/3 publication 0. 

2.5. The Origins of the Mini 

To solve the Svt alignment problem, a new data 
format was developed for storing tracks. This new 
format stored tracks together with their hit data in 
a compact, flexible, and efficient structure. This new 
track persistence format was used to design a new Svt 
alignment procedure, which reduced the turnaround 
time to roughly I day, and which produced measur- 
ably better physics results. The success of the new 
track data format and the new alignment procedure 
inspired a larger effort to develop a new data format 
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for all of BaBar based on the same basic design. This 
new format was referred to as the Mini. 

The Mini development project was officially begun 
in early 2001. A prototype of the Mini was produced 
in a complete re-processing of the BaBar data sam- 
ple started in 2001. The prototype Mini was used to 
perform many detector studies, and to refine the Mini 
design. Unfortunately the Mini prototype did not in- 
clude any information from the Emc, and so was not 
usable for physics analysis. The first complete version 
of the Mini was released in early 2002. The design 
and implementation of the Mini is described in the 
remaining sections. 



3. The BaBar Mini Design Goals 

The main design goal of the Mini was to persist the 
results of event reconstruction. To avoid the prob- 
lems of the Rec format (which had the same goal), the 
Mini was also required to be small (< 10 KBytes per 
event), self-contained (no references to objects outside 
the Mini), and fast (support reading at roughly 20 Hz, 
equivalent to reading the Aod format). 

Another goal of the Mini design was to provide ac- 
cess to sufficient detector detail to support standard 
calibration, alignment, diagnostics, and algorithm de- 
velopment. This capability would also allow the Mini 
to support a detailed single event display. 

The Mini was required to be able to follow changes 
in Conditions (alignment and calibration parameters), 
so that users could benefit from improved parameters 
without having to wait for data re-processing. This 
requirement would also allow analysis users to eas- 
ily propagate calibration and alignment uncertainties 
to systematic errors in their analysis, by simply re- 
running with altered parameters. 

To maintain compatibility with reconstruction, The 
Mini was required to provide access through the in- 
terfaces of actual reconstruction classes, without any 
significant loss in accuracy or content compared to the 
original reconstruction results. 

To support specialized use cases, the Mini was de- 
signed to allow users to customize the persistent con- 
tent according to their needs. This feature is relied on 
heavily in BaBar's new computing model, as a way of 
reducing the need to dump data into ntuples in order 
to do analysis. 

To leverage BaBar's huge physics analysis code 
base, the Mini was required to be compatible with the 
existing analysis framework. Explicitly, the goal was 
that an average user be able to convert their analysis 
job to read the Mini instead of Aod without chang- 
ing any physics-related code, and that the results ob- 
tained be equivalent (within floating point precision) 
to those obtained with the original Aod format job. 



4. Implementation of the Mini 

The Mini design goals require both access to the 
full detector detail through reconstruction interfaces, 
together with full compatibility and similar perfor- 
mance as the Aod (4-vector summary) format for 
physics analysis. The Mini satisfies these contradic- 
tory requirements by storing both high-level objects 
(tracks, calorimeter clusters, Cherenkov rings, parti- 
cle ID, etc), and the low-level objects (Dch hits, Emc 
crystals, Drc phototubes, etc) from which they were 
made. This results in some redundancy, as some con- 
tent of the high-level objects can also be extracted 
from their constituent low-level objects. Redundancy 
is generally considered a bad idea in data storage, as 
it can cause consistency problems. It was accepted 
for the Mini design as it afforded a large ( factor of 
10) performance improvement when reading high-level 
objects (see [7| for details), and because the Mini ac- 
cess mechanism includes safeguards against inconsis- 
tent data usage (see sectional for details). 

4.1. High-level objects in the Mini 

High-level objects in the Mini store the set of ref- 
erences to the low-level objects from which they were 
built, thus preserving the essential information of the 
pattern-recognition algorithm. High-level objects also 
store references to the other high-level objects they 
depend on. For instance, Drc rings store a list of Drc 
hit references, plus a reference to the track used to 
seed and fit the ring. 

High-level objects also store the results of cpu- 
intensive functions which their transient class sup- 
ports. For instance, track objects store the results 
of running the Kalman filter fit. Because these func- 
tions use the associated low-level objects, these stored 
results are redundant with the low-level objects them- 
selves. Because these functions were invoked during 
reconstruction, stored results implicitly depend on the 
Conditions which were used when reconstruction was 
run. Thus, stored results of high-level objects do not 
follow changing Conditions. To follow new Condi- 
tions, the stored results cannot be used, and the orig- 
inal functions must be called on rebuilt transient ob- 
jects. Details of how the Mini can be configured to use 
(or not) stored function results is described in section 

m 

4.2. Low-level objects on the Mini 

Where possible, the low-level objects in the Mini 
store raw detector readout information instead of 
physical quantities. Thus Dch hits are stored as TDC 
values and wire numbers instead of physical times and 
positions. Physical quantities are then computed from 
the raw data on the Mini using conversion algorithms 
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Figure 2: Single event display of a typical BaBar multi-hadron event in Aod format on the left, and Mini format on the 
right. In Aod, tracks are modeled as perfect helices, and neutral objects as 4-vectors. 



and Conditions data, as implemented in the transient 
low-level reconstruction class accessor functions. This 
allows the Mini to follow Conditions changes, and to 
provide consistent results with reconstruction. 

For some subsystems, the raw detector data are 
very large, and they must be compressed before being 
stored on the Mini. In these cases, the compressed 
information is still stored in detector units. For in- 
stance, Svt hits are compressed to store the average 
cluster position instead of all the individual strips in 
a cluster, but the average position is expressed in strip 
coordinates. 

The Mini also stores a subset of low-level objects 
not associated to any high-level object. Monte Carlo 
and other studies have shown that many of the unas- 
sociated low-level objects were generated by particles 
produced directly or indirectly in the e + e~ collision. 



Unassociated low-level objects can be used to iden- 
tify physics objects missed due to reconstruction in- 
efficiency, or to search for unusual physics signals not 
found by the standard reconstruction. Associated and 
unassociated low-level objects can also be combined to 
create a 'complete' set of low-level objects. This al- 
lows the Mini to be used to develop and test pattern 
recognition algorithms, or to be used as a source for 
partial re-processing. 

Unfortunately, most unassociated low-level objects 
in a typical BaBar event do not come from the e + e~ 
collision. Storing all of them would therefore bloat the 
Mini and degrade its contents. Instead, only unasso- 
ciated low-level objects which pass stringent quality 
cuts are stored on the Mini. For instance, only those 
unassociated Svt hits whose arrival time is consistent 
with the reconstructed event time are stored on the 
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Mini. This cut reduces the number of unassociated Svt 
hits by roughly a factor of 20, while keeping roughly 
90% of the 'real' unassociated Svt hits. 



5. Mini Persistence 

The Mini was first implemented using Objectivity 
persistence, and it has recently been ported to use 
Root persistence, as part of BaBar's new computing 
model (see section [HJ. A large part of the Mini's suc- 
cess was due to strict adherence to a few basic per- 
sistence design principles, described in the following 
sections. 

The Mini persistence is controlled by the standard 
BaBar persistence mechanism. A dedicated loader 
module is run for each detector subsystem, which 
creates the scribes responsible for translating specific 
transient objects into their persistent counterpart. 
The event key by which a scribe identifies its transient 
object is configurable through the loader module Tel 
interface. Thus the user can control the content of 
the Mini by choosing which loader modules to run, 
which scribes to create, and which transient objects 
the scribes should convert, configurable through con- 
trol scripts on standard executables. 

The configurability of Mini persistence was used to 
improve the efficiency of the Svt alignment procedure 
(see section |2~5|) . By reading a custom reduced Mini 
holding just selected Svt track information, the itera- 
tive part of the alignment procedure was sped up from 
several hours to just 10 minutes per iteration. As con- 
vergence required hundreds of iterations, this speedup 
was essential for producing the 23 different Svt align- 
ment sets used in the 2002 data re-processing. 

For technical reasons, the Mini was not placed in a 
new database. Instead, the Esd database was cleared 
of all previous content, and the Mini was placed there. 
The Mini thus completely replaced the original Esd 
format. 

5.1. The Persistent Composite Design 

The Mini persistence design is based on the persis- 
tent composite design pattern, in which a single persis- 
tent object holds the data for a collection of transient 
objects of a given type. Using this design, the Mini 
stores the contents of a BaBar event in just 11 per- 
sistent objects, minimizing the impact of the 12 bytes 
per persistent object Objectivity overhead. A graph- 
ical representation of the Mini persistence design is 
shown in figure [21 

In the persistent composite design pattern, the con- 
tents of a transient object collection are stored in per- 
sistent arrays (ooVArrays for Objectivity) of embedded 
objects, which have a one <-> one relationship with 
transient objects. 



Embedded classes translate to and from their tran- 
sient counterparts, but otherwise provide no interface. 
They are implemented as simple structs of primitive 
data types, with no dependence on any persistence 
technology. Because they are persistence-free, the 
same embedded classes can be used by different per- 
sistence mechanisms, making it easy to port persistent 
composite classes other persistence technologies. 

Associations between objects on the Mini are stored 
in the persistent composite objects as a single refer- 
ence (OORef for Objectivity) to other persistent com- 
posites. Specific objects in other composites are then 
referenced as the index into the corresponding embed- 
ded array. This results in much less overhead than 
storing explicit references, as an index (typically 2 
bytes) is much smaller than an OORef (12 bytes). 

5.2. Data Packing on the Mini 

To minimize the size of the Mini, its data contents 
are packed, according to the following rules. 

• Boolean data are stored as a single bit. 

• Integer data are stored using as many bits as 
required by their range. 

• Float data are packed and stored as integers. 
The Least Significant Bit of the packed data 
(LSB) corresponds to roughly 1% of the in- 
trinsic detector resolution of the quantity be- 
ing stored. Float data with an extreme natural 
range are packed logarithmically, using an al- 
gorithm which is locally flat to avoid binning 
artifacts in histograms (see figure 0}. 

• Packed integer and float values are truncated at 
physically reasonable ranges, not 'worst possi- 
ble' ranges. Values beyond the physically rea- 
sonable range are flagged as under or overflows. 

• Strings are stored as a key (integer) in a 
string integer map. The map is stored outside 
the Mini event data. 

• Small data fields are combined (bitwise OR) to 
fill a standard type (char, short, or long) word. 

• Data members of embedded classes are all 
aligned to either char, short, or long word 
boundaries (one choice per class), to ensure that 
embedded object arrays are compact in memory. 

• To avoid the Objectivity overhead of storing 
and retrieving virtual tables, embedded classes 
have no virtual functions, including no virtual 
destructor. 

• Direct data members of persistent composite 
classes are aligned to long word boundaries, to 
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Figure 3: Diagram of the object tree representing a reconstructed track in BaBar, on the left in transient form, and on 
the right converted to Mini persistent format. The Mini stores all tracks in an event in a single persistent object. 
Track-specific details are described by the embedded objects stored in the different persistent arrays. 



be consistent with Objectivity persistent object 
alignment 

• To avoid creating persistent memory fragments, 
variable arrays (ooVArrays) are sized exactly 
once, either on initialization or in the construc- 
tor body. 



The Mini user can also control how the 
BaBar Conditions Database is accessed. For 
instance, a user can configure their Mini job to use 
the same Conditions as were used when the data were 
originally processed, or the most recent Conditions, 
or even to override the Conditions Database and 
use explicitly-provided constants. Conditions access 
configuration is most relevant to refit and raw modes. 



6. Accessing Mini Data 

As described in the previous section, the Mini per- 
sists several levels of data which overlap in content. 
The Mini is designed so that the user must decide 
which level of detail is appropriate when reading back 
the Mini. In making this decision, the user must bal- 
ance the greater detail and (potentially) greater ac- 
curacty which are available when reading low-level 
data, against the greater speed possible when read- 
ing high-level data. The Mini persistence provides a 
very precise degree of access control, so that some ob- 
jects may be read with high precision while others are 
read with lower precision. Similarly, it is possible to 
read an event initially at low precision, and later up- 
grade some or all objects to higher precision once the 
event or the objects in it pass cuts. Maintaining co- 
herent and correct data contents under these general 
conditions is however difficult, and involves a level of 
expertise beyond that of the average user. 

To make it easier for users to correctly configure 
reading the Mini, a set of self-consistent access modes 
are provided which roughly span the available options. 
While some users may need to optimize their Mini 
jobs by directly controling the persistent access, it is 
expected that most Mini users will choose one of these 
five access modes: micro, cache, extend, refit, and raw. 
The access mode is set in a user job through the lev- 
elOf Detail global Tel variable, which is then used by 
the sequences which read and prepare the Mini data 
for analysis. The specifics of these different modes are 
described below. A comparison of the access speed in 
these different modes is given in table Hill 



6.1. Cache Mode 

Cache mode refers to reading the Mini so that high- 
level objects are built from the stored function results 
instead of low-level data. Only a summary of the low- 
level information can be obtained in cache mode. For 
instance, in cache mode, the track transient object 
can provide the number and logical identity of the 
hits from which it was fit, but it cannot produce the 
actual hit objects. Since a cache mode job doesn't 
read or process any low-level data, it is much faster 
than a refit job run on the same data. 

In cache mode, all the stored track fits can be used. 
The default version of the mini stores all the unique 
mass hypothesis Kalman fits, evaluated at their point 
of closest approach to the Z axis, plus the ir fit evalu- 
ated where the track exits the tracking volume. 



6.2. Micro Mode 

Micro mode is a variant of cache mode where some 
features are turned off, in order to make the Mini be- 
have more like the Aod format. For instance, since Aod 
stores only ir track fits, in micro mode the Mini track 
provides only the ir fit. Micro mode is intended to 
make it easy to compare and validate the Mini against 
the Aod format. Since micro mode is no faster than 
cache mode, and yet returns less accurate values, it is 
not recommended for use in physics analysis. 
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Packing_Binsize(d) - 
Packing_RMS(d) 



Transverse Track Impact Parameter (cm) 

Figure 4: Resolution of the globally-logarithmic, locally flat packing algorithm used in the Mini, as applied to packing 
the track transverse impact parameter. This quantity has a detector resolution of roughly 10 microns, and range of 
values from to 80 cm. The packing algorithm employed is extremely efficient to unpack, involving only 2 floating 
point operations (one addition and one multiplication). 



6.3. Extend Mode 

Extend mode is a variant of cache mode in which 
the validity range of a track is extended from the 
range of the fit result stored on the Mini, up to the 
first hit. Otherwise, extend mode behaves exactly 
as cache mode. The persistent data read in extend 
mode are exactly the same as in cache mode, but since 
more tracking functions are called, extend is some- 
what slower to read than cache. 

In extend mode, the fit results stored on the Mini 
are interpreted as a multi-dimensional 'hit', constrain- 
ing all the track parameters to the stored fit values. 
These 'constraint hits' are used to create a Kalman 
track fit object, which is an instance of the same 
Kalman fit class used to fit tracks in BaBar recon- 
struction. The Kalman fit adds the effects of passive 
material and magnetic field distortions as the track 
traverses the the detector, extending the range over 
which the stored track can provide physically accu- 
rate parameters. Since hits are not read in extend 
mode, extended tracks are valid from the origin out 
to the first hit. 

An example where extend mode is useful is recon- 
structing long-lived particles which decay outside the 
beampipe, such as K°. In cache mode (and when us- 
ing the Aod format), these particles are vertexed using 
fit results measured inside the beampipe. In extend 
mode, track fit results are measured at the decay ver- 
tex, so that the reconstructed parameters of the K° 
are more accurate and less biased. 



6.4. Refit Mode 

In refit mode, the function results stored with the 
high-level objects are ignored, and high-level tran- 
sient objects are rebuilt from constituent low-level 
objects. Because refit mode involves reading more 



data and performing more computation to create the 
high-level objects, it is substantially slower than ei- 
ther cache or extend mode. Because new Conditions 
are read and used when rebuilding the transient ob- 
jects, a refit mode job can follow changing Conditions, 
or even changes in some reconstruction algorithms. 
Refit mode is intended to support detector studies, 
single event display, specialized analyses that depend 
on low-level data, and analyses that need to use new 
Conditions or algorithms. 

6.5. Raw Mode 

In raw mode, high-level objects stored on the Mini 
are ignored. Both assigned and unassigned low-level 
objects are are read and combined together into 'com- 
plete' lists, and the reconstruction pattern recognition 
algorithms are run on those. Raw mode is intended 
to support development of pattern-recognition recon- 
struction algorithms, to support re-processing, and to 
support event mixing studies. Because raw mode in- 
vokes pattern recognition algorithms, it is slower than 
refit mode. 

While the high-level objects created when reading 
the Mini in raw mode are similar to those read in 
the other modes, they are not necessarily identical, as 
the initial sets of low-level objects are not exactly the 
same as those used when running reconstruction on 
raw online data. 

Raw mode is still under development as a user op- 
tion, though it has been tested in a limited form. 



7. Performance of the Mini 

General performance numbers for the Mini, such as 
size on disk and read speed under various conditions 
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Data Generic Multi-hadron 

Mini 1 6.4 KBytes | 10.0 KBytes" 

analysis reduced Mini 1.8 KBytes 3.2 KBytes 
Aod 1.8 KBytes 2.7 KBytes 



Table II The average (compressed) size of BaBar events 
stored in Mini, analysis reduced Mini and Aod formats. 
Results for the analysis reduced Mini are based on a 
prototype. 



mode 


Generic 


Multi Hadron 


micro 


45 Hz 


24 Hz 


cache 


45 Hz 


22 Hz 


extend 


28 Hz 


14 Hz 


refit 


5.3 Hz 


2.4 Hz 


raw 


3.3 Hz * 


1.0 Hz* 


Aod 


246 Hz 


173 Hz 


analysis reduced Mini 


96 Hz 





Table III The event rate reading the Mini in different 
modes on an 1.4 GHz Pentium III Linux machine. The 
times for raw mode were estimated using the BaBar 
reconstruction executable, as this Mini mode has not yet 
been fully implemented. Results for the analysis reduced 
Mini are based on a prototype. 

are listed in tables ITTT1 and ITT1 Performance of the Aod 
format is given for comparison. As efforts to optimize 
the read speed of the Mini have only just begun, these 
numbers should be considered provisional. Table ITVI 
gives a breakdown of where time is currently spent in 
a typical Mini analysis job. This clearly shows that 
unpacking data plays a very minor role in the perfor- 
mance. 



8. BaBar's New Computing Model 

In April 2002 BaBar computing was reviewed by a 
combined internal and external review board. Among 



Operation 


% time 


Reconstruction transient creation + deletion 


35 


Objectivity data read 


25 


Physics interface adapter 


20 


Event loop overhead 


10 


Data unpacking 


0.1 



Table IV The fraction of time spent in various 
operations when reading the Mini in cache mode in a 
standard physics analysis job. 



other recommendations, the report of this commit- 
tee || suggested that BaBar reconsider its Analy- 
sis Model in light of the opportunities offered by the 
Mini. In response to these recommendations, a new 
Computing Model was adopted by the collaboration 
in December 2002. This model introduces two ma- 
jor changes, first that the BaBar event store be con- 
verted to use Root persistence instead of Objectivity, 
and second that the existing physics analysis format 
(Aod) be replaced with a new format more consistent 
with the Mini. After some discussion, a reduced Mini 
customized for analysis has been chosen as the Aod 
format replacement. 

To replace the Aod format, the analysis reduced 
Mini must have a similar size on disk and read-back 
speed as Aod. The starting point for achieving this 
is to store only those quantities referenced in cache 
mode. Performance results from a prototype analy- 
sis reduced Mini are given in tables ILlTI and ITT1 show- 
ing that it is similiar to Aod. A major effort is now 
underway at BaBar to improve the read-back perfor- 
mance. Based on profiles of a standard analysis job, 
the largest time sinks come from inefficiency in the 
reconstruction code invoked when reading the Mini, 
and in the analysis interface to the Mini (see table 
llVjl . Based on the problems already identified, the 
read speed is expected in increase by between a factor 
of 2 and 10. 

In the new BaBar Computing Model, the analysis 
reduced Mini will store only the cache mode informa- 
tion. The remainder of the Mini information will be 
stored in a separate file. The complete Mini will be 
accessed by reading both the reduced and remainder 
information. Thus BaBar will store event data in a 
coherent format, split into pieces specialized for anal- 
ysis and detector studies, with no redundancy and 
easy navigation between the two pieces. 

A requirement of the new computing model is that 
the Aod replacement be accessible interactively. As 
part of satisfying this requirement, BaBar has chosen 
Root as the persistence technology for the Aod re- 
placement, since it has been shown to work as a HEP 
event store technology both at BaBar and elsewhere, 
and because the Root/CINT interface is a standard 
interactive access mechanism. To satsify this require- 
ment, the Mini is being ported to Root persistence. 
The Mini Root persistent implementation uses base 
classes developed at BaBar which allow interactive ac- 
cess to packed data contents of embedded objects, by 
dynmaically linking class functions into Root [lfj. 

A key feature of the new computing model is the 
ability to create custom output streams for physics 
groups, by exploiting the configurability of the Mini. 
Coupled with the interactive access capabilty afforded 
by Root persistence, it is hoped that custom streams 
can replace the Aod format dump ntuples used in most 
analyses. This will substantially reduced the comput- 
ing and human resources used in analysis. 
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Because BaBar is a functioning experiment, the new 
computing model must be introduced in a way that 
does not disrupt ongoing efforts, and quickly enough 
that its benefits can be exploited before the experi- 
ment ends. The plan is to develop and deploy the 
new computing model within calendar year 2003. 



ularly successful in enabling BaBar to produce high 
quality physics results, and which laid the foundation 
for developing the Mini. 
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9. Conclusions 

BaBar has introduced a new event data format ref- 
cred to as the Mini. This format addresses deficiencies 
in BaBar's older formats. 

BaBar has just completed a full data re-processing, 
in which the complete Mini replaced the unused Raw, 
Rec, and Esd formats. This reduced the volume of 
data produced in reconstruction by a roughly factor of 

10, significantly improving the efficiency of the event 
reconstruction farm, and requiring half as many data 
servers compared to previous processings. BaBar's 
data storage costs were also cut by roughly the same 
factor of 10. 

BaBar is now starting to use the Mini for physics 
analysis. An ambitious new computing model has 
been adopted, in which a reduced form of the Mini will 
replace the current physics analysis format. When the 
new computing model is deployed in late 2003, BaBar 
will have a coherent event data format covering most 
of the needs of the experiment, finally satisfying the 
intent of the original format design. 
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